30 research outputs found

    On-stack replacement, distilled

    Get PDF
    On-stack replacement (OSR) is essential technology for adaptive optimization, allowing changes to code actively executing in a managed runtime. The engineering aspects of OSR are well-known among VM architects, with several implementations available to date. However, OSR is yet to be explored as a general means to transfer execution between related program versions, which can pave the road to unprecedented applications that stretch beyond VMs. We aim at filling this gap with a constructive and provably correct OSR framework, allowing a class of general-purpose transformation functions to yield a special-purpose replacement. We describe and evaluate an implementation of our technique in LLVM. As a novel application of OSR, we present a feasibility study on debugging of optimized code, showing how our techniques can be used to fix variables holding incorrect values at breakpoints due to optimizations

    Write-rationing garbage collection for hybrid memories

    Get PDF
    Emerging Non-Volatile Memory (NVM) technologies offer high capacity and energy efficiency compared to DRAM, but suffer from limited write endurance and longer latencies. Prior work seeks the best of both technologies by combining DRAM and NVM in hybrid memories to attain low latency, high capacity, energy efficiency, and durability. Coarse-grained hardware and OS optimizations then spread writes out (wear-leveling) and place highly mutated pages in DRAM to extend NVM lifetimes. Unfortunately even with these coarse-grained methods, popular Java applications exact impractical NVM lifetimes of 4 years or less. This paper shows how to make hybrid memories practical, without changing the programming model, by enhancing garbage collection in managed language runtimes. We find object write behaviors offer two opportunities: (1) 70% of writes occur to newly allocated objects, and (2) 2% of objects capture 81% of writes to mature objects. We introduce writerationing garbage collectors that exploit these fine-grained behaviors. They extend NVM lifetimes by placing highly mutated objects in DRAM and read-mostly objects in NVM. We implement two such systems. (1) Kingsguard-nursery places new allocation in DRAM and survivors in NVM, reducing NVM writes by 5x versus NVM only with wear-leveling. (2) Kingsguard-writers (KG-W) places nursery objects in DRAM and survivors in a DRAM observer space. It monitors all mature object writes and moves unwritten mature objects from DRAM to NVM. Because most mature objects are unwritten, KG-W exploits NVM capacity while increasing NVM lifetimes by 11x. It reduces the energy-delay product by 32% over DRAM-only and 29% over NVM-only. This work opens up new avenues for making hybrid memories practical

    A Simple Graph-Based Intermediate Representation

    No full text
    We present a graph-based intermediate representation (IR) with simple semantics and a low-memory-cost C++ implementation. The IR uses a directed graph with labeled vertices and ordered inputs but unordered outputs. Vertices are labeled with opcodes, edges are unlabeled. We represent the CFG and basic blocks with the same vertex and edge structures. Each opcode is defined by a C++ class that encapsulates opcode-specific data and behavior. We use inheritance to abstract common opcode behavior, allowing new opcodes to be easily defined from old ones. The resulting IR is simple, fast and easy to use. 1. Introduction Intermediate representations do not exist in a vacuum. They are the stepping stone from what the programmer wrote to what the machine understands. Intermediate representations must bridge a large semantic gap (for example, from FORTRAN 90 vector operations to a 3-address add in some machine code). During the translation from a high-level language to machine code, an optimizin..

    Compiler Support for Out-of-Core Arrays on Parallel Machines

    No full text
    Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fortran D's DECOMPOSITION, ALIGN, and DISTRIBUTE statements. The central problem is to generate "deferred routines" which delay computations until all the data they require have been read into main memory. We present the results for two applications, LU factorization and red-black relaxation, on 1 to 32 nodes of an Intel Paragon after hand application of these compiler techniques. 1 Introduction Improvements in processor performance have outpaced developments in both memory and disk I/O speed. As a result, out-of-core applications, which require significantly more data than will fit into RAM,..

    Compiler Support for Out-of-Core Arrays on Parallel Machines

    No full text
    Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fortran D's DECOMPOSITION, ALIGN, and DISTRIBUTE statements. The central problem is to generate "deferred routines" which delay computations until all the data they require have been read into main memory. We present the results for two applications, LU factorization and red-black relaxation, on 1 to 32 nodes of an Intel Paragon after hand application of these compiler techniques. 1 Introduction Improvements in processor performance have outpaced developments in both memory and disk I/O speed. As a result, out-of-core applications, which require significantly more data than will fit into RAM,..

    A Model and Compilation Strategy for Out-of-Core Data Parallel Programs

    No full text
    It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization. A Model and Compilation Strategy for Out-of-Core Data Parallel Programs Rajesh Bordawekar Alok Choudhary Ken Kennedy y Charles Koelbel y Michael Paleczny y 1 Introduction There can be no argument that high-performance I/O is essential to high-performance computing. Many users see parallelism as the best way to achieve this performance, thus motivating the calls for "parallel I/O." Informal surveys of users of high-performance computing have ..

    Snippets

    No full text

    A model and compilation strategy for out-of-core data parallel programs

    Full text link
    http://infobib.de/blog/2009/03/25/materialsammlung-rund-um-den-heidelberger-appel

    A model and compilation strategy for out-of-core data parallel programs

    No full text
    It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization.
    corecore